AITopics | multi-agent mdp

Collaborating Authors

multi-agent mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fairness in Multi-Agent Sequential Decision-Making

Chongjie Zhang, Julie A. Shah

Neural Information Processing SystemsOct-2-2025, 21:37:31 GMT

We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with a consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy.

agent, criterion, multi-agent mdp, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Slovenia (0.04)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

Multi-agent Markov Entanglement

Chen, Shuze, Peng, Tianyi

arXiv.org Machine LearningJun-30-2025

Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of local functions: $V(s_1,s_2,\ldots,s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not "entangled" -- a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the "Markov entanglement" for multi-agent MDPs and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of index policies is weakly entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show how Markov entanglement can be efficiently estimated in practice, providing practitioners with an empirical proxy for the quality of value decomposition.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Machine Learning

2506.02385

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Kenya (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Thinking Beyond Visibility: A Near-Optimal Policy Framework for Locally Interdependent Multi-Agent MDPs

DeWeese, Alex, Qu, Guannan

arXiv.org Artificial IntelligenceJun-5-2025

Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are known to be NEXP-Complete and intractable to solve. However, for problems such as cooperative navigation, obstacle avoidance, and formation control, basic assumptions can be made about local visibility and local dependencies. The work DeWeese and Qu 2024 formalized these assumptions in the construction of the Locally Interdependent Multi-Agent MDP. In this setting, it establishes three closed-form policies that are tractable to compute in various situations and are exponentially close to optimal with respect to visibility. However, it is also shown that these solutions can have poor performance when the visibility is small and fixed, often getting stuck during simulations due to the so called "Penalty Jittering" phenomenon. In this work, we establish the Extended Cutoff Policy Class which is, to the best of our knowledge, the first non-trivial class of near optimal closed-form partially observable policies that are exponentially close to optimal with respect to the visibility for any Locally Interdependent Multi-Agent MDP. These policies are able to remember agents beyond their visibilities which allows them to perform significantly better in many small and fixed visibility settings, resolve Penalty Jittering occurrences, and under certain circumstances guarantee fully observable joint optimal behavior despite the partial observability. We also propose a generalized form of the Locally Interdependent Multi-Agent MDP that allows for transition dependence and extended reward dependence, then replicate our theoretical results in this setting.

agent, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.04215

Genre: Research Report (0.64)

Industry: Transportation (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Fairness in Multi-Agent Sequential Decision-Making

Chongjie Zhang, Julie A. Shah

Neural Information Processing SystemsFeb-9-2025, 08:36:44 GMT

artificial intelligence, criterion, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Slovenia (0.04)

Industry: Telecommunications (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost

Nakhleh, Khaled, Eksin, Ceyhun, Ekin, Sabit

arXiv.org Artificial IntelligenceOct-19-2024

This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temporal difference (TD) policy evaluation step. We use the separable structure of the instantaneous cost to show that the policy improvement step follows a Boltzmann distribution that depends on the current value function estimate and the uncontrolled transition probabilities. This allows agents to compute the improved joint policy independently. We show that both the synchronous (entire state space evaluation) and asynchronous (a uniformly sampled set of substates) versions of the OPI scheme with finite policy evaluation rollout converge to the optimal value function and an optimal joint policy asymptotically.

artificial intelligence, iteration, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2410.15156

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

DeWeese, Alex, Qu, Guannan

arXiv.org Artificial IntelligenceJun-10-2024

Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.

agent, interdependent multi-agent mdp, multi-agent mdp, (15 more...)

arXiv.org Artificial Intelligence

2406.06823

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report (0.50)
Workflow (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Fairness in Multi-Agent Sequential Decision-Making

Neural Information Processing SystemsMar-13-2024, 09:48:23 GMT

agent, criterion, fairness policy, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Slovenia (0.04)

Industry: Telecommunications (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

Emami, Patrick, Zhang, Xiangyu, Biagioni, David, Zamzam, Ahmed S.

arXiv.org Artificial IntelligenceJul-17-2023

In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2307.08794

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Colorado > Jefferson County > Golden (0.04)
North America > United States > Colorado > Broomfield County > Broomfield (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Industry:

Energy > Renewable (0.68)
Energy > Power Industry (0.66)
Government > Regional Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

Fiscko, Carmel, Kar, Soummya, Sinopoli, Bruno

arXiv.org Artificial IntelligenceApr-24-2023

This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The controller's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. Finding an optimal policy for any specific dropout realization is a special case of this problem. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where $N$ denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.

agent, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2304.12458

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

On Fairness in Decision-Making under Uncertainty: Definitions, Computation, and Comparison

Zhang, Chongjie (Massachusetts Institute of Technology) | Shah, Julie A. (Massachusetts Institute of Technology)

AAAI ConferencesMar-6-2015

The utilitarian solution criterion, which has been extensively studied in multi-agent decision making under uncertainty, aims to maximize the sum of individual utilities. However, as the utilitarian solution often discriminates against some agents, it is not desirable for many practical applications where agents have their own interests and fairness is expected. To address this issue, this paper introduces egalitarian solution criteria for sequential decision-making under uncertainty, which are based on the maximin principle. Motivated by different application domains, we propose four maximin fairness criteria and develop corresponding algorithms for computing their optimal policies. Furthermore, we analyze the connections between these criteria and discuss and compare their characteristics.

artificial intelligence, criterion, fairness, (17 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback